Detecting Outliers in Categorical Record Databases Based on Attribute Associations
نویسندگان
چکیده
Outlier detection, a data mining technique to detect rare events, deviant objects, and exceptions from data, has been drawing increasing attention in recent years. Most existing outlier detection algorithms focus on numerical data sets. We target categorical record databases and detect records in which many attribute values are not observed even though they should occur in association with other attribute values in the records. To detect such records as outliers, we provide an outlier degree, which demonstrates sufficient detection performance in accuracyevaluation experiments compared with the probabilistic approach used in a related work. We also propose an efficient algorithm for detecting such outlier records. Experiments using real data sets show that our method detects interesting records as outliers.
منابع مشابه
Outlier Analysis of Categorical Data using NAVF
Introduction Outlier analysis is an important research field in many applications like credit card fraud, intrusion detection in networks, medical field .This analysis concentrate on detecting infrequent data records in dataset. Most of the existing systems are concentrated on numerical attributes or ordinal attributes .Sometimes categorical attribute values can be converted into numerical valu...
متن کاملDetecting Outliers in Exponentiated Pareto Distribution
In this paper, we use two statistics for detecting outliers in exponentiated Paretodistribution. These statistics are the extension of the statistics for detecting outliers inexponential and gamma distributions. In fact, we compare the power of our test statisticsbased on the simulation study and identify the better test statistic for detecting outliers inexponentiated Pareto distribution. At t...
متن کاملOn detecting spatial categorical outliers
Spatial outlier detection is an important research problem that has received much attentions in recent years. Most existing approaches are designed for numerical attributes, but are not applicable to categorical ones (e.g., binary, ordinal, and nominal) that are popular in many applications. The main challenges are the modeling of spatial categorical dependency as well as the computational effi...
متن کاملAn Efficient Algorithm for Outlier Detection in High Dimensional Real Databases
Detecting outlier patterns in data has been an important research topic in statistics, data mining and machine learning communities for many years. Research in identifying effective solutions to this problem have several interesting applications in a myriad of domains ranging from data cleaning to financial fraud detection and from network intrusion detection to clinical diagnosis of diseases. ...
متن کاملAttribute-based Access Control for Cloud-based Electronic Health Record (EHR) Systems
Electronic health record (EHR) system facilitates integrating patients' medical information and improves service productivity. However, user access to patient data in a privacy-preserving manner is still challenging problem. Many studies concerned with security and privacy in EHR systems. Rezaeibagha and Mu [1] have proposed a hybrid architecture for privacy-preserving accessing patient records...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008